-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openvla policy intergration #10
base: main
Are you sure you want to change the base?
Conversation
Thanks for the contribution! Could you post the success rates for each task? Could you post a source for the implementation of Code comments:
|
Hi xuanlinli17, I have corrected all the typos. The implementation of |
I see. Might want to get help from the official authors to validate / revise the implementation as the Bridge results are near zero for some reason, and pick coke can has large variance across different backgrounds. Additionally, it's possible that OpenVLA might not follow Octo implementation in real deployment. |
Also you can modify https://github.com/simpler-env/SimplerEnv/blob/main/tools/calc_metrics_evaluation_videos.py to quickly summarize the results for OpenVLA (just put dummy numbers for the real numbers and don't push the script). You can ignore the nans. |
I'm also working on implementing OpenVLA into SimplerEnv, and I had the same issue: OpenVLA fails drastically on Bridge. I wonder if that has anything to do with the controller mentioned in #11 |
Same here, Severe lack of performance of OpenVLA on WindowX robot. |
I checked with the authors and I don't think there is action ensembling or action history. Here is the updated code, which you can try:
|
OpenVLA setup requirements:
Please add these instructions to Readme and add an "OpenVLA Inference Setup" section |
Typos in |
Hello @xuanlinli17, I tried the code above but openvla still fails on widowx tasks. Is it possibly an implementation problem? Should I set any params for widowx? |
Yeah that's my finding too, but I don't think the authors did some special treatments to evaluate OpenVLA on Bridge. There might be some coordinate transforms on Bridge which is different. |
@xuanlinli17 Thank you! I will continue to look into this. If I find any additional information or solutions, I'll make sure to share it with you. |
When I run the scripts
How to fix it? |
^ Run |
I have made a run, here is the full result of vla: I use my branch as codebase: https://github.com/hilookas/SimplerEnv (note: the "real" result is set to 0)
|
For Google Robot pick coke can, looks like the variant aggregation eval of OpenVLA is a lot better than visual matching, which is interesting...
|
@hilookas Thanks for providing the results! |
@hilookas Thank you for your great work! I want to try out the openVLA in the simulator as well. But I wonder why the performance in the sim is not as good as what is claimed by the paper in the real-world benchmark. It should not be due to the sim-to-real gap, right? Cause SIMPLER is designed to mitigate this gap. |
@QuanyiLi OpenVLA did 5 trials in real for each task (and there's no grid-based evals with >= 50 trials per task for Google Robot like in Simpler). Task settings like the cabinets and backgrounds used in the real world can also be different. We are requesting paired sim-real evaluation from Google following Simpler's protocol (and the same backgrounds, cabinets, etc). |
Thanks. Look forward to the updated results! |
Sure! I have update the result. Please see log above. My result is slightly different but not much from xuanlinli17's run in
|
How much memory is needed to run OpenVLA? I tried 3090 and 40GB A100 but both go out of memory. |
@yxchng It takes me 15G vram for 4090, following the official instructions to use bf16. |
1x3090 is enough. Just remember don't open env and inference process more than 1. |
@hilookas Would like to know where the tables come from? I did not see them in the OpenVLA paper |
I made it :D Based on my experiment above. If you have another run result, please let me know! |
I do not have enough GPU resources attached to a Screen the Sapien simulator requires. :-(, running OpenVLA locally is quite a burden for most consumer-level PC. |
95cd66d
to
be0543f
Compare
Easy to succeed on google robot task BUT always failed on WidowX related task. @xuanlinli17 |
Simpler-OpenVLA on WidowX is known to have some strange behaviors that I don't yet know why... Would you investigate too? |
Please follow the troubleshooting section in readme; if issue still persists, please open a new issue & discussion |
@xuanlinli17 Maybe I didn't make myself clear, what I meant was that I ran the bash file inside the scripts file, but the robot simulation screen didn't appear. What am I supposed to do? Make the picture appear. |
@xuanlinli17 @DelinQu Hi! I am trying to run openVLA on simple Env and I am getting this error when I run Here is the error message.
and here is my GPU information
any insight on this would be greatly appreciated! |
^ I think you need a local cuda version >= 11.6, probably match your torch version. |
When I run
My computer is GTX 4090 and Ubuntu. |
hello, I'm curious why you didn't add 'In: What action should the robot take to {INSTRUCTION}?\nOut:' when inputting prompts to the processor? Would adding or not adding this sentence affect the results? |
Hi guys, I would like to ask about is it inadvisable to evaluate OpenVLA on the SIMPLER simulation platform, given that Widox performed very poorly on the Bridge dataset? Is there a better method to reproduce the results presented in the OpenVLA paper? |
OpenVLA didn't apply any augmentation during training, compared to other models like RT-*. This could explain why they performed poorly on Simpler on WidowX (even though their result on Google Robot is fine). |
This pull request integrates openvla policy. The evaluation scripts remain consistent with the original repo under ./scripts/